Porting Smalltalk to the Amstrad CPC: a failed experiment

Posted by pulkomandy on Sat Feb 24 12:38:04 2024 • Comments (0) •

This week project was an experiment with porting Smalltalk to the Amstrad CPC. It didn't work, but it was interesting. So here are my notes.

About Smalltalk

Smalltalk is an object oriented language developed at Xerox PARC. It started in 1969 but the first release outside of PARC was in 1980, and was in fact the third major rewrite of the language.

Smalltalk was part of the "Dynabook" project, which was a research project about imagining the future of computing. The vision for this was a small personal tablet computer (hardware somewhat similar to an iPad), where you could run your own heavily customized system. Of course, this was way out of reach of 1970s technology. But that didn't stop the team. They built an "interim" machine to be the first step towards this: the Xerox Alto.

The Alto is a personal computer, but with a large CRT display and not portable at all. It implements the instruction set from the Data General Nova, an earlier machine that has a rather simple CPU, on top of a microcoded system that allows to reuse the same logic for also driving the display, implementing the disk controller, and a few other things.

The early versions of the hardware didn't have a lot of RAM, but since the project was about designing the future of computing, that was a problem. So, Smalltalk runs in a virtual machine, implemented in software, that implements virtual memory: most of the data is stored on disk, and loaded on-demand when the Smalltalk software needs it, and all of this, without the user/programmer having to think about it.

Why port it to the Amstrad CPC?

The CPC was one of the first computers I had access to. I still own one (several in fact) and still use it from time to time, as well as developping hardware expansions for it. The initial machine has 64K of RAM, later expanded to 128. But mine has a 2MB expansion card connected to it. However, the machine runs a Z80 CPU, which can only access a total of 64K of memory. As a result, extended memory is accessible in banks of 16K, one of which can be accessed by the CPU at a time. There are no languages that can handle this automatically: the BASIC shipped with the machine was never updated, and the SDCC compiler can do it, but in a bit limited way.

Anyway, the BASIC on the CPC is a rather good implementation, but it is still BASIC: not very fast, and not very well suited to structured programming. And C requires a cross-development setup, where the code is written and compiled on another machine, and the CPC is used just to run it. That's a bit sad, the machine should be able to work on its own.

There aren't many languages that can fit in these 64K of memory. Well, you can go with Pascal, and in fact there are pretty good versions of it for CP/M. You can go with C, but the Z80 CPU is not very well suited to a typical C compiler, and so the generated assembler code is messy and not efficient, and, more importantly, there is no hope for a good C compiler running directly on the machine (that would be possible with a simpler CPU or maybe if it was more reasonable to use the memory banks).

Moreover, my goal here is to have a language for quick prototyping and experiments. When I need the full performance of the machine, I'm happy with using a C compiler or even hand optimizing some parts in assembler. But this is a bit annoying to do on a larger scale. Essentially, I want something acting as the main interface for the machine, just like BASIC does, but with a more modern language.

This reduces the set of candidates to: BASIC itself, stack based languages such as Forth and LISP (which would both be good choices but are a bit too different from what I'm used to), maybe something like Lua or Wren (but not the existing implementation of Wren, which is based on floating point numbers and NaN tagging, not really suitable for the Amstrad. And there is Smalltalk.

The Smalltalk from Xerox as it was released in 1980 is interesting. The main interpreter could fit in a dozen or so kilobytes of code. That's just the amount we have on a CPC ROM expansion bank, so it could fit in the place occupied by the BASIC. However, Smalltalk is not just a language, it is a complete system, with a GUI, overlapping windows, a built-in debugger, and so on. Well, at least it would be interesting to see all of this running on the CPC, and how slow it would be. But maybe not as a practical thing to use, at least not as a replacement for the BASIC. The Smalltalk world mainly continued from there, and today you can find Squeak and Pharo, which are direct desendants of this version of Smalltalk, but keep getting larger and larger, and are probably much more than the CPC can handle.

Little Smalltalk

So, I had this idea of writing a Smalltalk VM for the CPC in the back of my head for a while. Lately, I looked again for existing Smalltalk implementation that I could maybe use as a starting point. You know, start from working C code, even if slow and inefficient, and gradually rewrite things out in assembler until I have something usable.

These searches brought me to Little Smalltalk. This is a dialect of Smalltalk, not quite compatible with the standard Smalltalk-80. The syntax is very similar, with a few minor differences. But, maybe more interestingly, the GUI environment and built-in debugger and all that stuff is not available. This is a command-line Smalltalk, initially designed for use from a simple serial terminal on a UNIX timeshared computer.

There are four versions of Little Smalltalk, each improving in the previous one and making the code simpler and nicer. Since this is rather simple C code, and it compiles to a reasonably small executable, it was not unreasonable to attempt porting it to the CPC. So I started hacking.

Little Smalltalk is a bit different from Smalltalk-80 in that it is not a completely abstract VM specification. For speed, it retains the pointer size of the host machine and uses that as its main type. With this and a few other tricks, and the removal of the GUI parts, it should be quite simpler than Smalltalk-80. Simple enough that I could attempt running it in a few evenings of work.

I started by compiling all 4 versions on my PC, and looking at the code size for each. Version 4 is indeed the smaller one, so I started with that. It also is the only one not using K&R style function prototypes, which is good for me, because SDCC (the C compiler I'm going to use) doesn't implement those.

From the start I noticed that the "image" size (the Smalltalk bytecode) was too large to fit in RAM, but I thought this was because of building it for a 64bit machine in my tests, and the final version would be considerably smaller.

A note on running C on the Amstrad CPC

There are various tutorials for this out there, but the ones I found were slightly out of date. So, I'm going to share my own files here, and explain what I needed to change in the Little Smalltalk implementation to get it running as much as I did (not enough for anything usable).

C application startup

If you know something about C, you know that execution of a C program starts with the main() function. Well, that isn't entirely true. Before this can be run, the code should make sure that all global and static variables are initialized to the correct values. In SDCC, this is implemented by the GSINIT code segment. To make sure this code is run before everything else, we need a little bit of assembler code, usually called CRT0. SDCC ships with a CRT0 meant for their own Z80 simulator platform, but that is not quite suitable for the CPC. So, here is my version suitable for the CPC.

  ;; This is the CRT0 module. Declaring this will make sure it shows properly
  ;; in .map files and other places.
  .module crt0

  ;; Note that the main function is a global that can be found elsewhere.
  ;; The assembler then knows how to emit a reference to it for the linker to
  ;; handle. Note that C functions are translated to assembler labels with an
  ;; extra _ in front of them.
  .globl  _main

  ;; The output file starts with a HEADER section, that is not loaded. In theory
  ;; we could put a standard AMSDOS header here, but I chose to do that with a
  ;; separate tool instead.
  .area _HEADER

  ;; Then we have all the sections with read-only code and data
  ;; GSINIT and GSFINAL are for the initialization code run before main (see
  ;; explanation below where they are filled in).
  ;; HOME and CODE is for the code and constants
  ;; INITIALIZER and INITIALIZED are for global and static variables initialized
  ;; with some value in the code. The GSINIT code should copy them from INITIALIZER
  ;; (in ROM) to INITIALIZED (in RAM) so they have the right value. In our case,
  ;; all the code and data is loaded in RAM, so we don't really bother with that.
  ;; DATA and BSS are the RAM areas used by the software. BSS should be cleared
  ;; to 0 during program startup.
  ;; Finally there is the HEAP, this is the area used for dynamic allocations.
  ;; This could be SDCC's malloc, but it is not a great implementation, in our
  ;; case we will use a custom and simplified allocator that can't free any memory.
  ;;
  ;; Each compilation unit (.rel file) can add data independently to each of these
  ;; sections. Then, the linker concatenates the section from each of them
  ;; (for example, all the GSINITs together) to generate the final output file.
  .area   _GSINIT
  .area   _GSFINAL
  .area   _HOME
  .area   _CODE
  .area   _INITIALIZER

  .area _INITIALIZED
  .area   _DATA
  .area _BSEG
  .area   _BSS

  ;; Now that we have defined all sections and in what order we want them (well,
  ;; in theory, but there seem to be other things messing with this order currently),
  ;; we can start filling in these sections with data or define symbols that point
  ;; to important parts of the memory, so these symbols can be accessed by the
  ;; C code (there will be examples of this later in the article).
  .area   _HEAP
  ;; Define a symbol for the "heap" start. This is the first address not used by SDCC
  ;; generated code, so anything above that is free for our own needs. Our simplified
  ;; memory allocator will make use of it.
_heap_start::

  .area   _GSINIT
gsinit::
  ;; All code that needs to be run at global init (before entering main) is inserted here by SDCC
  ;; Each compilation unit will generate little bits of code for each thing that needs initialization
  ;; (for example if you initialize a global or static variable with some complex expression).
  .area   _GSFINAL
  ;; The GSFINAL area is inserted after the GSINIT one. Usually nothing else gets added here, but
  ;; it could be used for example to implement atexit() or similar (code to run after main has
  ;; returned).
  ;; So, after the GSINIT code is done, we can call into the main function. We use JP to do a tail-call
  ;; optimization, the main function will return directly to the caller (to the BASIC interpreter for example).
  jp      _main

  .area   _CODE
  ;; Finally we need some code in the CODE segment. This will be put at the start
  ;; of the output file at 0x100 (see the linker script). For simplicity (since SDCC
  ;; does not generate the AMSDOS header for us), we put the entry point at that address.
  ;; TODO; it would be even better if the GSINIT and GSFINAL code was already at 0x100,
  ;; then we could directly have the program entry point pointing there.
  ;; Another option is to have GSINIT and GSFINAL at the end of the code, and overlapping
  ;; with the HEAP area. Since the init code is run only once, this allows to have this code
  ;; erased after it is run and not needed anymore. Maybe I will write about this
  ;; trick in another article later.
_start::
        jp gsinit

  ;; We also define the exit() function to be an infinite loop, because our code
  ;; expects that to be available.
_exit::
        jr _exit

So, we have a CRT0 file now. How do we use it? Well, we tell the linker about it using a linker script. Here is the one for the ImageBuilder tool (more on what that is in the next section).

-mjwx
-i imagebuilder.ihx
-b _CODE = 0x100
-k /bin/../data/sdcc/lib/z80
-k /packages/sdcc-4.4.0-1/.self/data/sdcc/lib/z80
-l z80
../crt0_cpc.rel
imageBuilder.rel

-e

This generates a file named "imagebuilder.ihx" in Intel hex format. It also forces the _CODE section to start at address 0x100 (this is where we want to load our program). It then declares the Z80 standard libraries, the linker will use these to provide implementations of standard functions (printf, malloc, strcpy, ...) if they are used. And finally, we have our two object files, the CRT0 and the output of the C compiler.

Finally we can tie this all together with a simple build script:

sdcc -c -mz80 --no-std-crt0 --opt-code-size --max-allocs-per-node 100000 imageBuilder.c -I../source
sdldz80 -f lst4.lk
makebin -p -o 0x100 < imagebuilder.ihx > imagebuilder.bin
hideur imagebuilder.bin -o ../IB.BIN -t 2 -x "&100" -l "&100"

The first two steps are compiling the .c file, and linking it using the above linker script. This result in a file in intel hex format, but we need a binary. The makebin tool is used to perform the conversion. Finally, we use hideur maikeur to add an AMSDOS header setting up the load address and entry point. As a result, our executable can be run directly from BASIC. That's all, we're up and running!

Porting ImageBuilder

ImageBuilder is a tool that parses a text description of smalltalk classes and methods, and converts it into a binary smalltalk "Image" containing class and method descriptions, and the bytecode to run for each of the methods. I need to run it on the CPC because the image format is platform specific (mainly, the class and object headers depend on the native pointer size).

Besides the things described in the chapter above, I made a number of changes to the ImageBuilder to get it to run. I believe these can be useful to anyone attempting to port C code to the CPC, so, here is the details of what I changed.

Console output

Of course, we want to see what the program is doing. This is traced by printf statements througout the code. SDCC provides an implementation of printf, but it needs to know how to send the characters to the screen. For this, we need to use the firmware BB5A function. This function takes a parameter in A that is the character to print.

That's a good thing, because that is exactly the calling convention used by SDCC. So, we can just tell SDCC that this function exists at that address. We do this with a little function pointer trick.

#define bb5a ((void(*)(char))(0xBB5A))

What does this do exactly? Well, we define a macro called "bb5a" (this could be any name we want to give to the function). This macro takes the value 0xBB5A, and casts it into a function pointer, to a function that takes a char argument and returns nothing. That's all SDCC needs to know to call that function. Now, when we want to print a character, we can do this:

	bb5a('h');

Unfortunately, that is not enough to get printf to work. SDCC follows the C standard a bit too closely, and needs a putchar function that takes its parameter as a 16-bit value. Also, the code uses \n for newlines, while the CPC needs \r\n. We can take care of these two problems by defining our putchar function this way:

int putchar(int c) {
       if (c == '\n')
               bb5a('\r');
       bb5a(c);
}

File I/O

The ImageBuilder needs an input and an output file (the sourcecode and the generated bytercode image, respectively). The original code is written for UNIX, using FILE* and the corresponding standard APIs. These are not available on the CPC, but instead we have AMSDOS, which, in this case, provides a good enough support for file access. But, it is not designed for accessing it from C, and a few tricks will be needed.

First of all, we need to initialize the disk ROM. On the CPC, when a program is started with the RUN command, the system is de-initialized, and all ROMs are turned off. This means any attempts at disk access would instead try to target the tape drive. Well, technically that would work, but it's easier to use some modern mass storage (in my case, it is an SD card connected to the Albireo mass storage expansion). So, we need a function to re-initialize the disk ROM.

static inline void amsdos_init(void)
{
       __asm
               LD HL, #0xABFF
               LD DE, #0x40
               LD C, #7
               CALL #0xBCCE
       __endasm;
}

I kept this extremely simplified. Normally, this function should also save the current drive number for AMSDOS and restore it after reinitializing the ROM. It should probably not use hardcoded values for HL and DE, but determine where it is safe to put the AMSDOS buffers. Maybe it should init all the ROMs instead of just ROM 7. Anyway, for the ImageBuilder I just wanted to get up and running quickly. So, I call this from the start of the main() function.

Next, we need to open a file for input. Unfortunately, the firmware function does not accept its parameters in the SDCC calling convention. SDCC puts the first 8bit parameter in A, the second parameter in DE, and then everything else on the stack. The CAS_IN_OPEN firmware vector needs the filename length in B, the filename pointer in HL, and a temporary work buffer in DE. So, only DE is at the right place. However, by looking at the generated code, we can see that, to push the filename pointer to the stack, SDCC is also accidentally putting it in HL. I decided to abuse this since it made my life a little easier. So we end up with the following:

static char amsdos_in_open(char fnlength, const char* buffer, const char* const fn)
{
       // A = filename length (we need it in B)
       // DE = buffer
       // on stack = filename (but it is also in HL by accident)
       __asm
               ld b,a
               ;pop hl
               call #0xbc77
       __endasm;

       // TODO error handling
       // Popping is taken care by SDCC
       return 0;
}

While we're at it, the function to open a file for output is pretty similar:

static char amsdos_out_open(int fnlength, const char* buffer, const char* const fn)
{
       __asm
               ld b,a
               ;pop hl
               call #0xbc8c
       __endasm;

       // TODO error handling
       return 0;
}

And we can also add the functions to read and write characters, and close the files when done. These match the SDCC calling convention, so, we handle them like the BB5A function from earlier:

#define amsdos_getc ((char(*)(void))(0xBC80))
#define amsdos_putc ((void(*)(char))(0xBC95))
#define amsdos_in_close ((void(*)(void))(0xBC7A))
#define amsdos_out_close ((void(*)(void))(0xBC8F))

Finally, the original code also uses fgets to read the input file line by line. AMSDOS doesn't natively provide that, but we can easily provide an implementation. We just need to read the file until there is a newline, and make sure to add a NULL terminator after it.

static char amsdos_gets(char* buffer)
{
       char* fin = buffer;
       for (;;) {
               // FIXME test carry instead for returning
               *fin = amsdos_getc();
               if (*fin == '\n') {
                       *++fin = 0;
                       return 1;
               }
               if (*fin == '\0')
                       return 0;
               fin++;
       }
}

This is not really correct, the end of file is not properly detected (in fact, that's a problem from the way amsdos_getc was implemented here). But the work never got as far as reading the file until the end anyways.

Now we can implement our file IO. Note that in AMSDOS there are no file descriptors like in UNIX, there can be only one input and one output file opened at the same time. Fortunately, our needs are even simpler: the ImageBuilder first reads all of the input, closes it, and only then writes the output. So, all that's left to do is declare a working buffer for AMSDOS:

static char amsdos_buffer[2048];

And then we can replace all the calls to fopen, fgetc, fgets, fputc and fclose in the sourcecode.

Memory allocations

After that, we try to run the program and... it immediately fails after trying to allocate some memory. It turns out, SDCC only provides 1000 bytes of memory to use for malloc() by default. That is, of course, ridiculously small. Changing that requires manually recompiling the memory allocator. I have no idea why they made it that way. Anyway, for the vast majority of allocations, the ImageBuilder never frees them. The idea is to keep everything in RAM until the final image can be generated. This means we can use an extremely simple allocator:

static void* memalloc(int size)
{
       extern char heap_start;
       // Make sure heap is aligned since the low bit is used for non-pointer stuff
       static char* ptr = (char*)(((unsigned int)(&heap_start) + 1) & 0xfffe);

       char* allocated = ptr;
       ptr += size;

       return allocated;
}

This allocator works by having a pointer to the start of the free memory, that is simply incremented whenever we need a new allocation. Where is the "start of free memory"? Well, remember in the linker script we declared a "heap_start" label that points there. So, we just need to grab the address of that label. So we declare it as an extern variable, and take its address. We then make sure it is an even address. This is because I suspect some things in Little Smalltalk use the lowest bit for storing a flag, so all allocated addresses must be even.

The code also makes use of strdup, the implementation from SDCC would still want to use its own malloc, so let's replace that as well:

static char* mystrdup(const char* in) { // keep heap aligned int len = strlen(in) + 1; if (len & 1) len++; const char* out = memalloc(len); if (out != NULL) strcpy(out, in); return out; }

Here as well, we take care of keeping the heap aligned. Otherwise, there is nothing special about it.

Finally, there was one place where malloc/free was used, that is in the ClassCommands methods that does something like this:

	char *class, *super, *ivars;

	// Some code here ...

	class = strdup(...);
	super = strdup(...);
	ivars = strdup(...);

	// More code here using those strings...

	free(class); free(super); free(ivars);

This is easily replaced with the following:

	char class[16], super[16], ivars[83];

	// Some code here ...

	strcpy(class, ...);
	strcpy(super, ...);
	strcpy(ivars, ...);

	// More code here using those strings...

And that's all, we can now run the ImageBuilder!

... Or can we? (Running out of memory)

Well, it starts parsing and compiling things, and it seems to work. But after a while, it crashes. Fortunately, I'm running this all in the ACE CPC emulator, which provides a good debugger and allows me to watch the CPU registers in realtime. I see that the code seems to be using quite a lot of stack space, and the SP register (the stack pointer) ends up way lower than it is allowed to be. The CPC firmware only allocates a relatively small space to the stack, since it didn't expect me to run complicated C code like this, with recursion and use of stack buffers to store strings.

That's not really a problem, we can move the stack somewhere else. But where? there is now a lot of things going on, and we don't really have much space to fit it all. Of course, it would be convenient if we could use the display RAM. On the CPC, this uses 16K of RAM, or 1/4 of the available space. Can we put the stack there? Of course there's the problem that we can't use that space for both the stack and for displaying messages. So, let's try to do it only while the parser function with the big recursive calls is running. And we can restore it to its normal location when we are done with the complicated parsing code. Let's add this to the parseBody function:

// Declare a new global variable where we can temporarily store the stack pointer
int savesp;

	// When entering the parseBody function:
	// Disable interrupts (so the firmware doesn't notice us doing strange things with the SP register)
	// Save the old value of SP to a global variable
	// Point SP to the screen RAM
       __asm
               DI
               ld (#_savesp),sp
               ld sp,#0xffff
       __endasm;

	// When exiting the parseBody function:
	// Restore the stack pointer to where it was
       __asm
               ld sp,(#_savesp)
       __endasm;

OK, it is a bit crazy to do this without telling the C compiler about it. But here it works because this function has no parameters and no local variables. And we do these things just at the right places so the compiler does not attempt to access the stack by other ways (such as by using the IX register as the frame pointer).

Now, we can see some glitches on screen while the parser is running, but at least it doesn't erase some important things in memory while running. And this gets us a little bit further into parsing the file...

But how far, exactly?

Unfortunatly, it does not get us far enough. With this setup, and tuning the compiler for optimizations a bit (that's the --max-allocs-per-node setting you saw earlier in the build script), we end up with about 16K of RAM available for the heap. I made some tweaks to other parts of the code, reducing the size of some memory buffers that didn't need to be so large. This heap fills rather fast as the parser creates more and more object declarations and compiles bytecode for several methods from the imageSource definition file. Unfortunately, we run out of memory after compiling barely 10% of the input file. So, a rough estimation is that we'll need 10x as much memory to get through the complete file. There's no way that's going to fit into the Z80 address space. Even if I removed most of the firmware, even if I used all the screen RAM. And then, the resulting image file would still be around 100K. So, the conclusion is that this is no match for the space efficiency of the Amstrad CPC BASIC, unfortunately.

Conclusion

Do I give up on running Smalltalk on the CPC? No, not really. But, it will need a Smalltalk VM that can manage banked memory. That's doable, as I have said earlier in the article, in the Xerox Alto they even did it by swapping things to disk, so, surely, memory banks are not a problem. But, if I'm going to do that, I may as well use the fact that I have 2MB of RAM, and so I can run a complete Smalltalk-80 image. I don't know how ridiculously slow that will be. But in any case, there is little gain from constraining myself to Little Smalltalk. If you're going to need hundreds of kilobytes of RAM for something, you may as well throw in a full-blown GUI in it.

Should I try some other language someday as a possible alternative to BASIC? Well, I'd love to. Ideally the requirements for that would be to use only 16K of code (fitting in a ROM), and maybe a few hundred bytes of RAM, leaving as much as possible free for the user. I think I would be fine with needing to write scripts in a text editor (such as Protext) and not have to use the way it is done in BASIC. Or maybe just a REPL where one can define custom functions and so would be fine. In Little Smalltalk, the editing of functions is left to an external text editor, and when the text editor returns, the resulting text is parsed and turned into bytecode. I think that's an interesting idea. Well, I'll keep looking at other options...

PulkoMandy's homepage